Goto

Collaborating Authors

 similarity judgment


Alignment with human representations supports robust few-shot learning

Neural Information Processing Systems

Should we care whether AI systems have representations of the world that are similar to those of humans? We provide an information-theoretic analysis that suggests that there should be a U-shaped relationship between the degree of representational alignment with humans and performance on few-shot learning tasks. We confirm this prediction empirically, finding such a relationship in an analysis of the performance of 491 computer vision models. We also show that highly-aligned models are more robust to both natural adversarial attacks and domain shifts. Our results suggest that human alignment is often a sufficient, but not necessary, condition for models to make effective use of limited data, be robust, and generalize well.


e8ddc03b001d4c4b44b29bc1167e7fdd-Paper-Conference.pdf

Neural Information Processing Systems

They live in the same physical world and are intimately familiar with the materials that comprise it, but they would have significant difficulty expressing their values and generalizing the results of an experiment they observetogether. The alchemist would likely learn poorly from examples of a reaction demonstrated by the chemist, not having the right inductive biases for the waytheworldactuallyworks.


Supplementary material for " Improving neural network representations using human similarity judgments " Anonymous Author(s) Affiliation Address email A Experimental details 1 A.1 Model features 2

Neural Information Processing Systems

Figure A.1: Among all hyperparameter combinations considered in our grid search, a combination of ( We used a compute time of approximately 5600 CPU-hours of 2.90GHz Intel Xeon Gold In this section, we outline our anomaly detection experimental setting in more detail. Given a dataset (e.g., CIFAR-10) with In contrast to the "one-vs-rest" setting, in LOO we define one class of the In both "one-vs-rest" and LOO AD settings, we evaluate model representations in the following way: We show the pairs of items that change the most in distance in Table B.1. "stethoscope", which are semantically unrelated but perhaps have some slight visual similarity, tend We show the results in Fig. B.1. Table B.1: Distances between pairs of individual items from THINGS, ranked by the relative change in cosine The top items move much closer together under naive alignment, while the bottom ones move much farther apart. Figure B.1: How does the global structure of the representations change after alignment?



Learning Human-like Representations to Enable Learning Human Values Andrea H. Wynn

Neural Information Processing Systems

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of representational alignment between humans and AI agents on learning human values.



Supplementary material for " Improving neural network representations using human similarity judgments " Anonymous Author(s) Affiliation Address email A Experimental details 1 A.1 Model features 2

Neural Information Processing Systems

Figure A.1: Among all hyperparameter combinations considered in our grid search, a combination of ( We used a compute time of approximately 5600 CPU-hours of 2.90GHz Intel Xeon Gold In this section, we outline our anomaly detection experimental setting in more detail. Given a dataset (e.g., CIFAR-10) with In contrast to the "one-vs-rest" setting, in LOO we define one class of the In both "one-vs-rest" and LOO AD settings, we evaluate model representations in the following way: We show the pairs of items that change the most in distance in Table B.1. "stethoscope", which are semantically unrelated but perhaps have some slight visual similarity, tend We show the results in Fig. B.1. Table B.1: Distances between pairs of individual items from THINGS, ranked by the relative change in cosine The top items move much closer together under naive alignment, while the bottom ones move much farther apart. Figure B.1: How does the global structure of the representations change after alignment?



Aligning Video Models with Human Social Judgments via Behavior-Guided Fine-Tuning

arXiv.org Artificial Intelligence

Humans intuitively perceive complex social signals in visual scenes, yet it remains unclear whether state-of-the-art AI models encode the same similarity structure. We study (Q1) whether modern video and language models capture human-perceived similarity in social videos, and (Q2) how to instill this structure into models using human behavioral data. To address this, we introduce a new benchmark of over 49,000 odd-one-out similarity judgments on 250 three-second video clips of social interactions, and discover a modality gap: despite the task being visual, caption-based language embeddings align better with human similarity than any pretrained video model. We close this gap by fine-tuning a TimeSformer video model on these human judgments with our novel hybrid triplet-RSA objective using low-rank adaptation (LoRA), aligning pairwise distances to human similarity. This fine-tuning protocol yields significantly improved alignment with human perceptions on held-out videos in terms of both explained variance and odd-one-out triplet accuracy. Variance partitioning shows that the fine-tuned video model increases shared variance with language embeddings and explains additional unique variance not captured by the language model. Finally, we test transfer via linear probes and find that human-similarity fine-tuning strengthens the encoding of social-affective attributes (intimacy, valence, dominance, communication) relative to the pretrained baseline. Overall, our findings highlight a gap in pretrained video models' social recognition and demonstrate that behavior-guided fine-tuning shapes video representations toward human social perception.


Uncovering the Computational Ingredients of Human-Like Representations in LLMs

arXiv.org Artificial Intelligence

The ability to translate diverse patterns of inputs into structured patterns of behavior has been thought to rest on both humans' and machines' ability to learn robust representations of relevant concepts. The rapid advancement of transformer-based large language models (LLMs) has led to a diversity of computational ingredients -- architectures, fine tuning methods, and training datasets among others -- but it remains unclear which of these ingredients are most crucial for building models that develop human-like representations. Further, most current LLM benchmarks are not suited to measuring representational alignment between humans and models, making benchmark scores unreliable for assessing if current LLMs are making progress towards becoming useful cognitive models. We address these limitations by first evaluating a set of over 70 models that widely vary in their computational ingredients on a triplet similarity task, a method well established in the cognitive sciences for measuring human conceptual representations, using concepts from the THINGS database. Comparing human and model representations, we find that models that undergo instruction-finetuning and which have larger dimensionality of attention heads are among the most human aligned, while multimodal pretraining and parameter size have limited bearing on alignment. Correlations between alignment scores and scores on existing benchmarks reveal that while some benchmarks (e.g., MMLU) are better suited than others (e.g., MUSR) for capturing representational alignment, no existing benchmark is capable of fully accounting for the variance of alignment scores, demonstrating their insufficiency in capturing human-AI alignment. Taken together, our findings help highlight the computational ingredients most essential for advancing LLMs towards models of human conceptual representation and address a key benchmarking gap in LLM evaluation.